{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.018414,
     "end_time": "2021-02-13T17:30:13.212232",
     "exception": false,
     "start_time": "2021-02-13T17:30:13.193818",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "This notebook contains an example for teaching."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.016442,
     "end_time": "2021-02-13T17:30:13.246048",
     "exception": false,
     "start_time": "2021-02-13T17:30:13.229606",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# A Case Study: The Effect of Gun Ownership on Gun-Homicide Rates"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.017996,
     "end_time": "2021-02-13T17:30:13.281738",
     "exception": false,
     "start_time": "2021-02-13T17:30:13.263742",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We consider the problem of estimating the effect of gun\n",
    "ownership on the homicide rate. For this purpose, we estimate the following partially\n",
    "linear model\n",
    "\n",
    "$$\n",
    " Y_{j,t} = \\beta D_{j,(t-1)} + g(Z_{j,t}) + \\epsilon_{j,t}.\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.016649,
     "end_time": "2021-02-13T17:30:13.315270",
     "exception": false,
     "start_time": "2021-02-13T17:30:13.298621",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.018756,
     "end_time": "2021-02-13T17:30:13.351290",
     "exception": false,
     "start_time": "2021-02-13T17:30:13.332534",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "$Y_{j,t}$ is log homicide rate in county $j$ at time $t$, $D_{j, t-1}$ is log  fraction of suicides committed with a firearm in county $j$ at time $t-1$, which we use as a proxy for gun ownership,  and  $Z_{j,t}$ is a set of demographic and economic characteristics of county $j$ at time $t$. The parameter $\\beta$ is the effect of gun ownership on the\n",
    "homicide rates, controlling for county-level demographic and economic characteristics. \n",
    "\n",
    "The sample covers 195 large United States counties between the years 1980 through 1999, giving us 3900 observations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 1.027605,
     "end_time": "2021-02-13T17:30:14.397806",
     "exception": false,
     "start_time": "2021-02-13T17:30:13.370201",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "data <- read.csv(\"../input/gun-example/gun_clean.csv\") \n",
    "dim(data)[1]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.017216,
     "end_time": "2021-02-13T17:30:14.432656",
     "exception": false,
     "start_time": "2021-02-13T17:30:14.415440",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### Preprocessing"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.017212,
     "end_time": "2021-02-13T17:30:14.467075",
     "exception": false,
     "start_time": "2021-02-13T17:30:14.449863",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "To account for heterogeneity across counties and time trends in  all variables, we remove from them county-specific and time-specific effects in the following preprocessing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 24.410531,
     "end_time": "2021-02-13T17:30:38.894264",
     "exception": false,
     "start_time": "2021-02-13T17:30:14.483733",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#################################  Find Variable Names from Dataset ########################\n",
    "\n",
    "varlist <- function (df=NULL,type=c(\"numeric\",\"factor\",\"character\"), pattern=\"\", exclude=NULL) {\n",
    "  vars <- character(0)\n",
    "  if (any(type %in% \"numeric\")) {\n",
    "    vars <- c(vars,names(df)[sapply(df,is.numeric)])\n",
    "  }\n",
    "  if (any(type %in% \"factor\")) {\n",
    "    vars <- c(vars,names(df)[sapply(df,is.factor)])\n",
    "  }  \n",
    "  if (any(type %in% \"character\")) {\n",
    "    vars <- c(vars,names(df)[sapply(df,is.character)])\n",
    "  }  \n",
    "  vars[(!vars %in% exclude) & grepl(vars,pattern=pattern)]\n",
    "}\n",
    "\n",
    "################################# Create Variables ###############################\n",
    "\n",
    "\n",
    "# Dummy Variables for Year and County Fixed Effects\n",
    "fixed  <- grep(\"X_Jfips\", names(data), value=TRUE, fixed=TRUE)\n",
    "year   <- varlist(data, pattern=\"X_Tyear\")\n",
    "\n",
    "# Census Control Variables\n",
    "census     <- NULL\n",
    "census_var <- c(\"^AGE\", \"^BN\", \"^BP\", \"^BZ\", \"^ED\", \"^EL\",\"^HI\", \"^HS\", \"^INC\", \"^LF\", \"^LN\", \"^PI\", \"^PO\", \"^PP\", \"^PV\", \"^SPR\", \"^VS\")\n",
    "\n",
    "for(i in 1:length(census_var)){\n",
    "  \n",
    "  census  <- append(census, varlist(data, pattern=census_var[i]))\n",
    "  \n",
    "}\n",
    "\n",
    "################################ Variables ##################################\n",
    "# Treatment Variable\n",
    "d     <- \"logfssl\"\n",
    "\n",
    "# Outcome Variable\n",
    "y     <- \"logghomr\"\n",
    "\n",
    "# Other Control Variables\n",
    "X1    <- c(\"logrobr\", \"logburg\", \"burg_missing\", \"robrate_missing\")\n",
    "X2    <- c(\"newblack\", \"newfhh\", \"newmove\", \"newdens\", \"newmal\")\n",
    "\n",
    "#################################  Partial out Fixed Effects ########################\n",
    "\n",
    "# New Dataset for Partiled-out Variables\n",
    "rdata    <- as.data.frame(data$CountyCode) \n",
    "colnames(rdata) <- \"CountyCode\"\n",
    "\n",
    "# Variables to be Partialled-out\n",
    "varlist <- c(y, d,X1, X2, census)\n",
    "\n",
    "\n",
    "# Partial out Variables in varlist from year and county fixed effect\n",
    "for(i in 1:length(varlist)){\n",
    "  form <- as.formula(paste(varlist[i], \"~\", paste(paste(year,collapse=\"+\"),  paste(fixed,collapse=\"+\"), sep=\"+\")))\n",
    "  rdata[, varlist[i]] <- lm(form, data)$residuals\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.016789,
     "end_time": "2021-02-13T17:30:38.928857",
     "exception": false,
     "start_time": "2021-02-13T17:30:38.912068",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Now, we can construct the treatment variable, the outcome variable and the matrix $Z$ that includes the control variables."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.042922,
     "end_time": "2021-02-13T17:30:38.988875",
     "exception": false,
     "start_time": "2021-02-13T17:30:38.945953",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "# Treatment Variable\n",
    "D     <- rdata[which(colnames(rdata) == d)]\n",
    "\n",
    "# Outcome Variable\n",
    "Y     <- rdata[which(colnames(rdata) == y)]\n",
    "\n",
    "# Construct matrix Z\n",
    "\n",
    "Z <- rdata[which(colnames(rdata) %in% c(X1,X2,census))]\n",
    "\n",
    "dim(Z)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.01807,
     "end_time": "2021-02-13T17:30:39.025018",
     "exception": false,
     "start_time": "2021-02-13T17:30:39.006948",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We have in total 195 control variables. The control variables $Z_{j,t}$ are from the U.S. Census Bureau and  contain demographic and economic characteristics of the counties such as  the age distribution, the income distribution, crime rates, federal spending, home ownership rates, house prices, educational attainment, voting paterns, employment statistics, and migration rates. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.035747,
     "end_time": "2021-02-13T17:30:39.078439",
     "exception": false,
     "start_time": "2021-02-13T17:30:39.042692",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "clu <- rdata[which(colnames(rdata) == \"CountyCode\")] #for clustering the standard errors\n",
    "data <- data.frame(cbind(Y, D, Z,as.matrix(clu)))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 1.126576,
     "end_time": "2021-02-13T17:30:40.222803",
     "exception": false,
     "start_time": "2021-02-13T17:30:39.096227",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "library(lfe)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.01855,
     "end_time": "2021-02-13T17:30:40.260049",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.241499",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## The effect of gun ownership"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.018478,
     "end_time": "2021-02-13T17:30:40.297035",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.278557",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### OLS"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.018386,
     "end_time": "2021-02-13T17:30:40.333613",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.315227",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "After preprocessing the data, we first look at simple regression of $Y_{j,t}$ on $D_{j,t-1}$ without controls as a baseline model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.136101,
     "end_time": "2021-02-13T17:30:40.488367",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.352266",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#baseline_formula <- as.formula(paste(y, \"~\", d ))\n",
    "#baseline.ols <- lm(baseline_formula,data=rdata)\n",
    "\n",
    "baseline.ols <- felm(logghomr ~ logfssl |0|0| CountyCode,data=data) # ols with clustered standard errors\n",
    "est_baseline <- summary(baseline.ols)$coef[2,]\n",
    "confint(baseline.ols)[2,]\n",
    "est_baseline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.019514,
     "end_time": "2021-02-13T17:30:40.528479",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.508965",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The point estimate is $0.282$ with the confidence interval ranging from 0.155 to 0.41. This\n",
    "suggests that increases in gun ownership rates are related to gun homicide rates - if gun ownership increases by 1% relative\n",
    "to a trend then the predicted gun homicide rate goes up by 0.28%, without controlling for counties' characteristics.\n",
    "\n",
    "Since our goal is to estimate the effect of gun ownership after controlling for a rich set county characteristics we next include the controls. First, we estimate the model by ols and then by an array of the modern regression methods using the double machine learning approach."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.229579,
     "end_time": "2021-02-13T17:30:40.777425",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.547846",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "control_formula <- as.formula(paste(\"logghomr\", \"~\", paste(\"logfssl\",paste(colnames(Z),collapse=\"+\"),\n",
    "                                                           sep=\"+\"),\"|0|0| CountyCode\"))\n",
    "control.ols <- felm(control_formula,data=data)\n",
    "est_ols <- summary(control.ols)$coef[2,]\n",
    "confint(control.ols)[2,]\n",
    "est_ols"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.039309,
     "end_time": "2021-02-13T17:30:40.858783",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.819474",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "After controlling for a rich set of characteristics, the point estimate of gun ownership reduces to $0.19$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.022083,
     "end_time": "2021-02-13T17:30:40.902070",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.879987",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "# DML algorithm\n",
    "\n",
    "Here we perform inference of the predictive coefficient $\\beta$ in our partially linear statistical model, \n",
    "\n",
    "$$\n",
    "Y = D\\beta + g(Z) + \\epsilon, \\quad E (\\epsilon | D, Z) = 0,\n",
    "$$\n",
    "\n",
    "using the **double machine learning** approach. \n",
    "\n",
    "For $\\tilde Y = Y- E(Y|Z)$ and $\\tilde D= D- E(D|Z)$, we can write\n",
    "$$\n",
    "\\tilde Y = \\alpha \\tilde D + \\epsilon, \\quad E (\\epsilon |\\tilde D) =0.\n",
    "$$\n",
    "\n",
    "Using cross-fitting, we employ modern regression methods\n",
    "to build estimators $\\hat \\ell(Z)$ and $\\hat m(Z)$ of $\\ell(Z):=E(Y|Z)$ and $m(Z):=E(D|Z)$ to obtain the estimates of the residualized quantities:\n",
    "\n",
    "$$\n",
    "\\tilde Y_i = Y_i  - \\hat \\ell (Z_i),   \\quad \\tilde D_i = D_i - \\hat m(Z_i), \\quad \\text{ for each } i = 1,\\dots,n.\n",
    "$$\n",
    "\n",
    "Finally, using ordinary least squares of $\\tilde Y_i$ on $\\tilde D_i$, we obtain the \n",
    "estimate of $\\beta$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.021101,
     "end_time": "2021-02-13T17:30:40.944373",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.923272",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "The following algorithm comsumes $Y, D, Z$, and a machine learning method for learning the residuals $\\tilde Y$ and $\\tilde D$, where the residuals are obtained by cross-validation (cross-fitting). Then, it prints the estimated coefficient $\\beta$ and the corresponding standard error from the final OLS regression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.095479,
     "end_time": "2021-02-13T17:30:41.060587",
     "exception": false,
     "start_time": "2021-02-13T17:30:40.965108",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "DML2.for.PLM <- function(z, d, y, dreg, yreg, nfold=2, clu) {\n",
    "  nobs <- nrow(z) #number of observations\n",
    "  foldid <- rep.int(1:nfold,times = ceiling(nobs/nfold))[sample.int(nobs)] #define folds indices\n",
    "  I <- split(1:nobs, foldid)  #split observation indices into folds  \n",
    "  ytil <- dtil <- rep(NA, nobs)\n",
    "  cat(\"fold: \")\n",
    "  for(b in 1:length(I)){\n",
    "    dfit <- dreg(z[-I[[b]],], d[-I[[b]]]) #take a fold out\n",
    "    yfit <- yreg(z[-I[[b]],], y[-I[[b]]]) # take a foldt out\n",
    "    dhat <- predict(dfit, z[I[[b]],], type=\"response\") #predict the left-out fold \n",
    "    yhat <- predict(yfit, z[I[[b]],], type=\"response\") #predict the left-out fold  \n",
    "    dtil[I[[b]]] <- (d[I[[b]]] - dhat) #record residual for the left-out fold\n",
    "    ytil[I[[b]]] <- (y[I[[b]]] - yhat) #record residial for the left-out fold\n",
    "    cat(b,\" \")\n",
    "        }\n",
    "  #rfit <- lm(ytil ~ dtil)    #estimate the main parameter by regressing one residual on the other\n",
    "  data <- data.frame(cbind(ytil, dtil, as.matrix(clu)))\n",
    "  rfit <- felm(ytil ~ dtil|0|0|CountyCode,data=data) \n",
    "  coef.est <- coef(rfit)[2]  #extract coefficient\n",
    "  #HC <- vcovHC(rfit)\n",
    "  se    <- summary(rfit,robust=T)$coefficients[2,2] #record robust standard error by County\n",
    "  cat(sprintf(\"\\ncoef (se) = %g (%g)\\n\", coef.est , se))  #printing output\n",
    "  return( list(coef.est =coef.est , se=se, dtil=dtil, ytil=ytil, rfit=rfit) ) #save output and residuals \n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.020604,
     "end_time": "2021-02-13T17:30:41.102146",
     "exception": false,
     "start_time": "2021-02-13T17:30:41.081542",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Now, we apply the Double Machine Learning (DML) approach with different machine learning methods. First, we load the relevant libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.486944,
     "end_time": "2021-02-13T17:30:41.610047",
     "exception": false,
     "start_time": "2021-02-13T17:30:41.123103",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "library(hdm)\n",
    "library(glmnet)\n",
    "library(sandwich)\n",
    "library(randomForest)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.022531,
     "end_time": "2021-02-13T17:30:41.655799",
     "exception": false,
     "start_time": "2021-02-13T17:30:41.633268",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Let us, construct the input matrices."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.071827,
     "end_time": "2021-02-13T17:30:41.750537",
     "exception": false,
     "start_time": "2021-02-13T17:30:41.678710",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "y <- as.matrix(Y)\n",
    "d <- as.matrix(D)\n",
    "z <- as.matrix(Z)\n",
    "clu <- rdata[which(colnames(rdata) == \"CountyCode\")]\n",
    "head(data.frame(cbind(y,d,as.matrix(clu))))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.022813,
     "end_time": "2021-02-13T17:30:41.796577",
     "exception": false,
     "start_time": "2021-02-13T17:30:41.773764",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "In the following, we apply the DML approach with the differnt versions of lasso.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.02306,
     "end_time": "2021-02-13T17:30:41.843131",
     "exception": false,
     "start_time": "2021-02-13T17:30:41.820071",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "## Lasso"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 25.497739,
     "end_time": "2021-02-13T17:31:07.363059",
     "exception": false,
     "start_time": "2021-02-13T17:30:41.865320",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#DML with Lasso:\n",
    "set.seed(123)\n",
    "dreg <- function(z,d){ rlasso(z,d, post=FALSE) } #ML method= lasso from hdm \n",
    "yreg <- function(z,y){ rlasso(z,y, post=FALSE) } #ML method = lasso from hdm\n",
    "DML2.lasso = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10,clu)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 25.88845,
     "end_time": "2021-02-13T17:31:33.296252",
     "exception": false,
     "start_time": "2021-02-13T17:31:07.407802",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#DML with Post-Lasso:\n",
    "dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm \n",
    "yreg <- function(z,y){ rlasso(z,y, post=T) } #ML method = lasso from hdm\n",
    "DML2.post = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 214.221131,
     "end_time": "2021-02-13T17:35:07.562866",
     "exception": false,
     "start_time": "2021-02-13T17:31:33.341735",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#DML with cross-validated Lasso:\n",
    "dreg <- function(z,d){ cv.glmnet(z,d,family=\"gaussian\", alpha=1) } #ML method = lasso from glmnet \n",
    "yreg <- function(z,y){ cv.glmnet(z,y,family=\"gaussian\", alpha=1) }  #ML method = lasso from glmnet \n",
    "DML2.lasso.cv = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)\n",
    "\n",
    "dreg <- function(z,d){ cv.glmnet(z,d,family=\"gaussian\", alpha=0.5) } #ML method = elastic net from glmnet \n",
    "yreg <- function(z,y){ cv.glmnet(z,y,family=\"gaussian\", alpha=0.5) }  #ML method = elastic net from glmnet \n",
    "DML2.elnet = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)\n",
    "\n",
    "dreg <- function(z,d){ cv.glmnet(z,d,family=\"gaussian\", alpha=0) } #ML method = ridge from glmnet \n",
    "yreg <- function(z,y){ cv.glmnet(z,y,family=\"gaussian\", alpha=0) }  #ML method = ridge from glmnet \n",
    "DML2.ridge = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.025906,
     "end_time": "2021-02-13T17:35:07.615290",
     "exception": false,
     "start_time": "2021-02-13T17:35:07.589384",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Here we also compute DML with OLS used as the ML method"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 6.523989,
     "end_time": "2021-02-13T17:35:14.164734",
     "exception": false,
     "start_time": "2021-02-13T17:35:07.640745",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "dreg <- function(z,d){  glmnet(z,d,family=\"gaussian\", lambda=0) } #ML method = ols from glmnet \n",
    "yreg <- function(z,y){  glmnet(z,y,family=\"gaussian\", lambda=0) }  #ML method = ols from glmnet \n",
    "DML2.ols = DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.024512,
     "end_time": "2021-02-13T17:35:14.215612",
     "exception": false,
     "start_time": "2021-02-13T17:35:14.191100",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Next, we also apply Random Forest for comparison purposes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.025256,
     "end_time": "2021-02-13T17:35:14.266042",
     "exception": false,
     "start_time": "2021-02-13T17:35:14.240786",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "### Random Forest\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 334.261061,
     "end_time": "2021-02-13T17:40:48.551591",
     "exception": false,
     "start_time": "2021-02-13T17:35:14.290530",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "#DML with Random Forest:\n",
    "dreg <- function(z,d){ randomForest(z, d) } #ML method=Forest \n",
    "yreg <- function(z,y){ randomForest(z, y) } #ML method=Forest\n",
    "set.seed(1)\n",
    "DML2.RF = DML2.for.PLM(z, d, y, dreg, yreg, nfold=2, clu) # set to 2 due to computation time"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.024871,
     "end_time": "2021-02-13T17:40:48.605021",
     "exception": false,
     "start_time": "2021-02-13T17:40:48.580150",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "We conclude that the gun ownership rates are related to gun homicide rates - if gun ownership increases by 1% relative\n",
    "to a trend then the predicted gun homicide rate goes up by about 0.20% controlling for counties' characteristics."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.027238,
     "end_time": "2021-02-13T17:40:48.657096",
     "exception": false,
     "start_time": "2021-02-13T17:40:48.629858",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Finally, let's see which method is actually better. We compute RMSE for predicting D and Y, and see which\n",
    "of the methods works better.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.087953,
     "end_time": "2021-02-13T17:40:48.775469",
     "exception": false,
     "start_time": "2021-02-13T17:40:48.687516",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "mods<- list(DML2.ols, DML2.lasso, DML2.post, DML2.lasso.cv, DML2.ridge, DML2.elnet, DML2.RF)\n",
    "\n",
    "RMSE.mdl<- function(mdl) {\n",
    "RMSEY <- sqrt(mean(mdl$ytil)^2) \n",
    "RMSED <- sqrt(mean(mdl$dtil)^2) \n",
    "return( list(RMSEY=RMSEY, RMSED=RMSED))\n",
    "}\n",
    "\n",
    "#RMSE.mdl(DML2.lasso)\n",
    "\n",
    "#DML2.lasso$ytil\n",
    "\n",
    "Res<- lapply(mods, RMSE.mdl)\n",
    "\n",
    "\n",
    "prRes.Y<- c( Res[[1]]$RMSEY,Res[[2]]$RMSEY, Res[[3]]$RMSEY, Res[[4]]$RMSEY, Res[[5]]$RMSEY,  Res[[6]]$RMSEY, Res[[7]]$RMSEY)\n",
    "prRes.D<- c( Res[[1]]$RMSED,Res[[2]]$RMSED, Res[[3]]$RMSED, Res[[4]]$RMSED, Res[[5]]$RMSED, Res[[6]]$RMSED, Res[[7]]$RMSED)\n",
    "\n",
    "prRes<- rbind(prRes.Y, prRes.D); \n",
    "rownames(prRes)<- c(\"RMSE D\", \"RMSE Y\");\n",
    "colnames(prRes)<- c(\"OLS\", \"Lasso\", \"Post-Lasso\", \"CV Lasso\", \"CV Ridge\", \"CV Elnet\", \"RF\")\n",
    "print(prRes,digit=6)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.027155,
     "end_time": "2021-02-13T17:40:48.832101",
     "exception": false,
     "start_time": "2021-02-13T17:40:48.804946",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "It looks like the best method for predicting D is Lasso, and the best method for predicting Y is CV Ridge.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 29.950099,
     "end_time": "2021-02-13T17:41:18.807192",
     "exception": false,
     "start_time": "2021-02-13T17:40:48.857093",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "dreg <- function(z,d){ rlasso(z,d, post=T) } #ML method= lasso from hdm \n",
    "yreg <- function(z,y){ cv.glmnet(z,y,family=\"gaussian\", alpha=0) }  #ML method = ridge from glmnet \n",
    "DML2.best= DML2.for.PLM(z, d, y, dreg, yreg, nfold=10, clu)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "papermill": {
     "duration": 0.034891,
     "end_time": "2021-02-13T17:41:18.892429",
     "exception": false,
     "start_time": "2021-02-13T17:41:18.857538",
     "status": "completed"
    },
    "tags": []
   },
   "source": [
    "Let's organize the results in a table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.088669,
     "end_time": "2021-02-13T17:41:19.006521",
     "exception": false,
     "start_time": "2021-02-13T17:41:18.917852",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "library(xtable)\n",
    "\n",
    "table <- matrix(0,9,2)\n",
    "table[1,1] <- as.numeric(est_baseline[1])\n",
    "table[2,1] <- as.numeric(est_ols[1])\n",
    "table[3,1]   <- as.numeric(DML2.lasso$coef.est)\n",
    "table[4,1]   <- as.numeric(DML2.post$coef.est)\n",
    "table[5,1]  <-as.numeric(DML2.lasso.cv$coef.est)\n",
    "table[6,1] <-as.numeric(DML2.elnet$coef.est)\n",
    "table[7,1] <-as.numeric(DML2.ridge$coef.est)\n",
    "table[8,1] <-as.numeric(DML2.RF$coef.est)\n",
    "table[9,1] <-as.numeric(DML2.best$coef.est)\n",
    "table[1,2] <- as.numeric(est_baseline[2])\n",
    "table[2,2] <- as.numeric(est_ols[2])\n",
    "table[3,2]   <- as.numeric(DML2.lasso$se)\n",
    "table[4,2]   <- as.numeric(DML2.post$se)\n",
    "table[5,2]  <-as.numeric(DML2.lasso.cv$se)\n",
    "table[6,2] <-as.numeric(DML2.elnet$se)\n",
    "table[7,2] <-as.numeric(DML2.ridge$se)\n",
    "table[8,2] <-as.numeric(DML2.RF$se)\n",
    "table[9,2] <-as.numeric(DML2.best$se)\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "################################# Print Results #################################\n",
    "\n",
    "colnames(table) <- c(\"Estimate\",\"Standard Error\")\n",
    "rownames(table) <- c(\"Baseline OLS\", \"Least Squares with controls\", \"Lasso\", \"Post-Lasso\", \"CV Lasso\",\"CV Elnet\", \"CV Ridge\", \"Random Forest\", \n",
    "                     \"Best\")\n",
    "\n",
    "table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.043017,
     "end_time": "2021-02-13T17:41:19.075707",
     "exception": false,
     "start_time": "2021-02-13T17:41:19.032690",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "print(table, digit=3)\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "papermill": {
     "duration": 0.052331,
     "end_time": "2021-02-13T17:41:19.154690",
     "exception": false,
     "start_time": "2021-02-13T17:41:19.102359",
     "status": "completed"
    },
    "tags": []
   },
   "outputs": [],
   "source": [
    "tab<- xtable(table, digits=3)\n",
    "print(tab, type=\"latex\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.6.3"
  },
  "papermill": {
   "default_parameters": {},
   "duration": 669.535702,
   "end_time": "2021-02-13T17:41:19.292106",
   "environment_variables": {},
   "exception": null,
   "input_path": "__notebook__.ipynb",
   "output_path": "__notebook__.ipynb",
   "parameters": {},
   "start_time": "2021-02-13T17:30:09.756404",
   "version": "2.2.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
